首页> 外文OA文献 >SSPred: A prediction server based on SVM for the identification and classification of proteins involved in bacterial secretion systems
【2h】

SSPred: A prediction server based on SVM for the identification and classification of proteins involved in bacterial secretion systems

机译:SSPred:基于SVM的预测服务器,用于细菌分泌系统中涉及的蛋白质的识别和分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Protein secretion systems used by almost all bacteria are highly significant for the normal existence and interaction of bacteria with their host. The accumulation ofgenome sequence data in past few years has provided great insights into the distribution and function of these secretion systems. In this study, a support vectormachine (SVM)- based method, SSPred was developed for the automated functional annotation of proteins involved in secretion systems further classifying theminto five major sub-types (Type-I, Type-II, Type-III, Type-IV and Sec systems). The dataset used in this study for training and testing was obtained from KEGGand SwissProt database and was curated in order to avoid redundancy. To overcome the problem of imbalance in positive and negative dataset, an ensemble ofSVM modules, each trained on a balanced subset of the training data were used. Firstly, protein sequence features like amino-acid composition (AAC), dipeptidecomposition (DPC) and physico-chemical composition (PCC) were used to develop the SVM-based modules that achieved an average accuracy of 84%, 85.17%and 82.59%, respectively. Secondly, a hybrid module (hybrid-I) integrating all the previously used features was developed that achieved an average accuracy of86.12%. Another hybrid module (hybrid-II) developed using evolutionary information of a protein sequence extracted from position-specific scoring matrix andamino-acid composition achieved a maximum average accuracy of 89.73%. On unbiased evaluation using an independent data set, SSPred showed good predictionperformance in identification and classification of secretion systems. SSPred is a freely available World Wide Web server at http//www.bioinformatics.org/sspred.
机译:几乎所有细菌使用的蛋白质分泌系统对于细菌的正常存在及其与宿主的相互作用都具有非常重要的意义。过去几年中基因组序列数据的积累为这些分泌系统的分布和功能提供了深刻的见识。在这项研究中,开发了一种基于支持向量机(SVM)的方法SSPred,用于对分泌系统中涉及的蛋白质进行自动功能注释,从而进一步将minmin分为五种主要亚型(I型,II型,III型,Type -IV和秒系统)。本研究中用于训练和测试的数据集是从KEGGand SwissProt数据库中获得的,为了避免重复,对其进行了整理。为了克服正负数据集中的不平衡问题,使用了一组SVM模块,每个模块都在训练数据的平衡子集上进行训练。首先,使用氨基酸序列(AAC),二肽组成(DPC)和理化组成(PCC)等蛋白质序列特征来开发基于SVM的模块,这些模块的平均准确度达到84%,85.17%和82.59%,分别。其次,开发了一种集成了所有以前使用的功能的混合模块(Hybrid-I),其平均准确度达到了86.12%。利用从位置特异性得分矩阵和氨基酸组成中提取的蛋白质序列的进化信息开发的另一种杂交模块(杂交II)实现了89.73%的最大平均准确度。在使用独立数据集的无偏评估中,SSPred在分泌系统的识别和分类中显示出良好的预测性能。 SSPred是可免费使用的万维网服务器,网址为http://www.bioinformatics.org/sspred。

著录项

  • 作者

    Pundhir, Sachin; Kumar, Anil;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号